Pattern Theoretic Knowledge Discovery
نویسنده
چکیده
y Abstract Future research directions in Knowledge Discovery in Databases (KDD) include the ability to extract an overlying concept relating useful data. Current limitations involve the search complexity to nd that concept and what it means to be \useful." The Pattern Theory research crosses over in a natural way to the aforementioned domain. The goal of this paper is threefold. First, we present a new approach to the problem of learning by Discovery and robust pattern nding. Second, we explore the current limitations of a Pattern Theoretic approach as applied to the general KDD problem. Third, we exhibit its performance with experimental results on binary functions, and we compare those results with C4.5. This new approach to learning demonstrates a powerful method for nd-ing patterns in a robust manner. 1 The Pattern Theory Approach Pattern Theory is a discipline that arose out of machine learning 3] 7] and switching theory 6]. The original goal was to develop formal methods of algorithm design from speciications. The approach is based on a technique called function decomposition and a measure called decomposed function cardinality (DFC). Since Pattern Theory is able to extrapolate on available information based on the inherent structure in the data, it is directly related to KDD. Decomposing a function involves breaking it up into smaller subfunctions. These smaller functions are further broken down until all subfunctions will no longer decompose. For a given function, the number of ways to choose two sets of variables (the partition space) is exponential. The decomposition space is even larger, since there are several ways the subfunctions can be Figure 1: Lookup Table Figure 2: Decomposition combined and there are several levels of subfunctions possible. The complexity measure that we use to determine the relative predictive power of diierent function decompositions is called DFC. DFC is calculated by adding the cardinalities of each of the subfunctions in the decomposition. The cardinality of an n-variable binary function is 2 n : We illustrate the measure in the above gures. In Figure 1, we have a function on four variables with cardinality 2 4 = 16: In Figure 2, we show the same function after it has been decomposed. The DFC of this representation for the original function is 2 2 + 2 2 + 2 2 = 12: The DFC measures the relative complexity of a function. When we search through the possible decompositions for a …
منابع مشابه
Design Pattern Detection by Multilayer Neural Genetic Algorithm
Design Patterns are proven solution to common recurring design problems. Design Pattern Detection is most important activity that may support a lot to re-engineering process and thus gives significant information to the designer. Knowledge of design pattern exists in the system design improves the program understanding and software maintenance. Therefore, an automatic and reliable design patter...
متن کاملDesign Pattern Detection by Sub Graph Isomorphism Technique
Design Patterns are proven solution to common recurring design problems. Design Pattern Detection is most important activity that may support a lot to re-engineering process and thus gives significant information to the designer. Knowledge of design pattern exists in the system design improves the program understanding and software maintenance. Therefore, an automatic and reliable design patter...
متن کاملDesigning an Ontology for Knowledge Discovery in Iran’s Vaccine
Ontology is a requirement engineering product and the key to knowledge discovery. It includes the terminology to describe a set of facts, assumptions, and relations with which the detailed meanings of vocabularies among communities can be determined. This is a qualitative content analysis research. This study has made use of ontology for the first time to discover the knowledge of vaccine in Ir...
متن کاملInformation-Theoretic Measures for Knowledge Discovery and Data Mining
A database may be considered as a statistical population, and an attribute as a statistical variable taking values from its domain. One can carry out statistical and information-theoretic analysis on a database. Based on the attribute values, a database can be partitioned into smaller populations. An attribute is deemed important if it partitions the database such that previously unknown regula...
متن کاملOn the Design of Knowledge Discovery Services Design Patterns and Their Application in a Use Case Implementation
As service-orientation becomes more and more popular in the computer science community, more research is done in applying service-oriented applications in specific fields, including knowledge discovery. In this paper we investigate how the service-oriented paradigm can benefit knowledge discovery, and how specific services and the knowledge discovery process as a whole should be designed. We pr...
متن کامل